Method and system for regression-based object detection in medical images

ABSTRACT

A method and system for regression-based object detection in medical images is disclosed. A regression function for predicting a location of an object in a medical image based on an image patch is trained using image-based boosting ridge regression (IBRR). The trained regression function is used to determine a difference vector based on an image patch of a medical image. The difference vector represents the difference between the location of the image patch and the location of a target object. The location of the target object in the medical image is predicted based on the difference vector determined by the regression function.

This application claims the benefit of U.S. Provisional Application No.60/849,936, filed Oct. 6, 2006, the disclosure of which is hereinincorporated by reference.

BACKGROUND OF THE INVENTION

The present invention relates to object detection in medical images, andmore particularly, to a regression method for detecting anatomicstructures in medical images.

Detecting anatomic structures in medical images, such as ultrasound,X-rays, CT images, MR images, etc., is important for medical imageunderstanding. For example, in order to segment an anatomic structure ina medical image, conventional techniques typically involve anatomicstructure detection followed by database-guided segmentation of theanatomic structure. Anatomic structure detection can also providevaluable initialization information for other segmentation techniquessuch as level set, active contour, etc.

Conventional object detection techniques used to detect anatomicstructures in medical images utilize a classifier-based object detectionapproach. Such a classifier-based object detection approach first trainsa binary classifier, discriminating the anatomic structure from thebackground, and then exhaustively scans the query image for anatomytargets. If the trained classifier is denoted by the posteriorprobability p(O|I), the scanning procedure mathematically performs oneof the following two tasks:find {θ:p(O|l(θ))>0.5;θεΘ}  (1){circumflex over (Θ)}=arg max p(O|I(θ)),  (2)where I(θ) is an image patch parameterized by θ, and Θ is the parameterspace where the search is conducted. In (1), multiple objects aredetected, and in (2), one object is detected.

The above described conventional approach reaches real time performancefor detecting objects in non-medical images. In, P. Viola et al., “RapidObject detection Using a Boosted Cascade of Simple Features,” In Proc.IEEE Conf. Computer Vision and Pattern Recognition, pages 511-518, 2001,which is incorporated herein by reference, the classifier-based approachis used for real time frontal view face detection by exhaustivelysearching all possible translations and a sparse set of scales. Otherapproaches that detect objects under in-plane/out-of-plane rotationshave been proposed, but only a sparse set of orientations and scales aretested in order to meet real time requirement. In such approaches,either multiple classifiers are learned, or one classifier is learnedbut multiple integral images according to different rotations arecomputed. In general, the computational complexity of theclassifier-based approach linearly depends on the image size (for thetranslation parameter) and the number of tested orientations and scales.

Medical anatomy often manifests an arbitrary orientation and scalewithin medical images. In order to perform subsequent tasks, such asobject segmentation, an accurate representation of orientation and scalemay be required. Accordingly, detection speed must be sacrificed fortesting a dense set of orientations and scales when using conventionaldetection approaches. Further, if the one classifier approach, which hasbeen shown to perform better than the multiple-classifier approach, isused, then rotating the images and computing their associated integralimages cost extra computations. Therefore, due to the exhaustive natureof the classifier-based approach, it is challenging to build a rapiddetector for medical anatomy using the classifier-based approach.

BRIEF SUMMARY OF THE INVENTION

The present invention provides a regression method for object detectionin medical images. Embodiments of the present invention can detectobjects, such as anatomic structures, by examining one or more sparselysampled windows instead of exhaustively scanning an image.

In one embodiment of the present invention, at least one image patch ofa medical image is received. A difference vector representing adifference between the location of the image patch and a location of atarget object is determined using a trained regression function. Theregression function can be trained using image-based boosting ridgeregression (IBRR). The location of the target object is predicted basedon the difference vector. It is possible that multiple image patches arereceived and the multiple difference vectors corresponding to the imagepatches can result in multiple predictions for the location of thetarget object. These predictions can be averaged to determine a finalestimate of the location of the target object.

In another embodiment of the present invention, training data includingimage patches of a training image and corresponding difference vectorscan be received. A regression function for predicting a location of atarget object in a medical image can be trained based on the receivedtraining data. IBRR can be used to train the regression function.

These and other advantages of the invention will be apparent to those ofordinary skill in the art by reference to the following detaileddescription and the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates the concept of regression-based anatomic structuredetection according to embodiments of the present invention;

FIG. 2 illustrates a method for training a regressive function foranatomic structure detection according to an embodiment of the presentinvention;

FIG. 3 illustrates exemplary training data;

FIGS. 4A and 4B illustrate a 1D decision stump and a regression stump,respectively;

FIG. 5A illustrates a method of training a regression function usingimage-based boosting ridge regression according to an embodiment of thepresent invention;

FIG. 5B is pseudo code for implementing the method of FIG. 5A;

FIG. 6A illustrates a feature selection method according to anembodiment of the present invention;

FIG. 6B is pseudo code for implementing the method of FIG. 6A;

FIG. 7 illustrates a method for detecting a target object in a medicalimage using a trained regression function according to an embodiment ofthe present invention;

FIG. 8 illustrates exemplary anatomic structure detection results usinga 2D parameterization according to an embodiment of the presentinvention;

FIG. 9 illustrates exemplary anatomic structure detection results usinga 5D parameterization according to an embodiment of the presentinvention; and

FIG. 10 is a high level block diagram of a computer capable ofimplementing the present invention.

DETAILED DESCRIPTION

The present invention is directed to a regression method for detectinganatomic structures in medical images. Embodiments of the presentinvention are described herein to give a visual understanding of theanatomic structure detection method. A digital image is often composedof digital representations of one or more objects (or shapes). Thedigital representation of an object is often described herein in termsof identifying and manipulating the objects. Such manipulations arevirtual manipulations accomplished in the memory or othercircuitry/hardware of a computer system. Accordingly, is to beunderstood that embodiments of the present invention may be performedwithin a computer system using data stored within the computer system.

An embodiment of the present invention in which a regression function istrained and used to detect a left ventricle in an echocardiogram isdescribed herein. It is to be understood that the present invention isnot limited to this embodiment and may be used for detection of variousobjects and structures in various types of image data.

FIG. 1 illustrates the concept of regression-based anatomic structuredetection according to embodiments of the present invention. Images (a)and (b) of FIG. 1 are 2D echocardiogram images of an apical four chamber(A4C) view. A 2D echocardiogram is a 2D slice of the human heartcaptured by an ultrasound imaging device, and the A4C view is acanonical slice where all four chambers of the heart, namely, leftventricle (LV), right ventricle (RV), left atrium (LA), and right atrium(RA), are visible. In image (a) of FIG. 1, we are interested indetecting the center position (t_(x,0),t_(y,0)) of the LV 102 in the A4Cechocardiogram, assuming that the orientation of the LV is upright andthe scale/size of the LV is fixed. Given an image patch I(t_(x),t_(y))centered at position θ=(t_(x),t_(y)), it is possible, according toembodiments of the present invention, to determine a regression functionF that acts as an oracle to predict the target position(t_(x,0),t_(y,0)) by calculating a difference vector dθ=(dt_(x),dt_(y))between the current position θ and the target positionθ₀=(t_(x,0),t_(y,0)), i.e., dθ=θ₀−θ, ordt _(x) =t _(x,0) −t _(x) , dt _(y) =t _(y,0) −t _(y).  (3)

Using such an oracle, it is possible to achieve detection of the targetanatomic structure (e.g., the LV 102) using just one scanned imagepatch. As used herein, the term “image patch” refers to a region of animage. The image region can be at various locations in a medical image,and can have various sizes and orientations, depending on the imagepatch parameter θ. In other words, the oracle (regression function)provides a mapping F:I→(dt_(x),dt_(y)), i.e.,(dt _(x) ,dt _(y))=F(I(t _(x) ,t _(y)) or dθ=F(I(θ)),  (4)and the ground truth (target) position is estimated as:{circumflex over (θ)}=θ+F(I(θ)).  (5)

As described above and shown in image (a) of FIG. 1, the regressionfunction is applied for 2D parameterization under the assumption theorientation of the LV is upright and the scale/size of the LV is fixed.Image (b) of FIG. 1 shows a 5D parameterization for modeling the LV 104without these assumptions. The 5D parameterization used isθ=(t_(x),t_(y),log(s_(x)),log(s_(y)),α): (t_(x),t_(y)) for translation,a for orientation, and (s_(x),s_(y)) for scale (or size) in the x and ydirections. Due to the multiplicative nature of the scale parameter, wetake the log operator to convert it to additive. The difference vectordθ=(dt_(x),dt_(y),ds_(x),ds_(y),dα) is given as:dt _(x) =t _(x,0) −t _(x) , dt _(y) =t _(y,0) −t _(y) , dα=α ₀−α,ds _(x)=log(s _(x,0))−log(s _(x)), ds _(y)=log(s _(y,0))−log(s_(y)),  (6)where θ₀=(t_(x,0),t_(y,0),s_(x,0),s_(y,0),α₀) is the ground truthparameter of the target.

Due to fundamental differences between generic object detection andmedical anatomy detection, it is possible to learn a regression functionto act as an oracle as described above. Unlike general object detectionthat needs to detect object instances from unconstrained scenes, medicalanatomy detection applies to more constrained medical images, such asthe echocardiograms shown in FIG. 1. As a result, in generic objectdetection, an unknown number of objects can appear at arbitrarylocations in images with arbitrary background; while in medical anatomydetection, since the anatomic structure of interest conforms with thehuman body atlas, there is a known number of objects appearing within apattern of geometric and appearance contexts. In medical images, thereis often only one anatomic structure of interest. For example, in theechocardiograms shown in FIG. 1, there is only one target LV available,and its relation with respect to other structures, such as the LA, RV,and RA is fixed. Also, there exists a strong correlation between therelative appearances of these structures in different echocardiogramimages. Accordingly, by knowing where the LA, RV, and RA are, it ispossible to predict the position of the LV quite accurately.

It is well known to utilize a medical atlas as an explicit source ofprior knowledge about the location, size, and shape of anatomicstructures. Such a medical atlas is deformed in order to match the imagecontent of a target image for segmentation, tracking, etc. However,embodiments of the present invention utilize atlas knowledge in animplicit approach by embedding the atlas knowledge in a learningframework for learning the regression function. Once the regressionfunction is trained, the atlas knowledge is reflected in the regressionfunction and the atlas does not need to be stored any longer.

FIG. 2 illustrates a method for training a regressive function foranatomic structure detection according to an embodiment of the presentinvention This method uses machine learning based on an annotateddatabase in order to train the regression function.

At step 202, training data is received. FIG. 3 illustrates exemplarytraining data for LV detection in an echocardiogram. As illustrated inFIG. 3, the training data includes multiple input-output pairs. Eachinput-output pair consists of an image patch and a difference vectorcorresponding to the image patch. The image patch are of various localimage portions of a training image, and the difference vector is thedifference in the location of the center of the corresponding imagepatch from the center of the target anatomic structure in the trainingimage. The training data in FIG. 3 is illustrated using the 2Dparameterization assuming fixed scale and orientation for the LV. The 5Dparameterization can also be used as described above. In the case of the5D parameterization, each difference vector contains parameters forscale and orientation in addition to the position (translation)parameters shown in FIG. 3.

At step 204, a regression function is trained using image based boostingridge regression (IBRR) based on the training data. The regressionfunction acts as an oracle for predicting a location of a targetanatomic structure based on input image patches. As used herein, thelocation of a target anatomic structure refers to the position, size,and orientation of the anatomic structure. In order to train theregression function, IBRR is used to generate the regression functionfrom multiple weak function representing image features of the trainingimage. IBRR is described in greater detail below by first describing animage-based boosting regression (IBR) training method, and thendescribing the IBRR training method.

The following notation is used herein for describing IBR: a is a scalar,a is a column vector, and A is a matrix. The input is denoted byxεR^(d), the output by y(x)εR^(q), the regression function by g(x):R^(d)→R¹ and the training data points by {(x_(n),y_(n)); n=1, 2, . . . ,N}. Further, we denote x^(T)Ax=∥x∥_(A) ² and tr{X^(T)AX}=∥X∥_(A) ². Inmedical anatomy detection x=I is the image, y=dθ is the differencevector, and the regression function g(x)=F(I) is the oracle.

IBR minimizes the following cost function, which combines a regressionoutput fidelity term and a subspace regularization term:

$\begin{matrix}{{{J(g)} = {\sum\limits_{n = {1:N}}\left\{ {{{{y\left( x_{n} \right)} - {g\left( x_{n} \right)}}}_{A}^{2} + {\lambda{{\mu - {g_{t}\left( x_{n} \right)}}}_{B}^{2}}} \right\}}},} & (7)\end{matrix}$where λ is a regularization coefficient.

IBR assumes that the regression output function g(x) takes an additiveform:

$\begin{matrix}{{{g_{t}(x)} = {{{g_{t - 1}(x)} + {\alpha_{t}{h_{t}(x)}}} = {\sum\limits_{i = {1:t}}{\alpha_{i}{h_{i}(x)}}}}},} & (8)\end{matrix}$where each h_(i)(x): R^(d)→R^(q) is a weak learner (or weak function)residing in a dictionary set H, and g(x) is a strong learner (or strongfunction). Boosting is an iterative algorithm that leverages theadditive nature of g(x): At iteration t, one more weak functionα_(t)h_(t)(x) is added to the target function g_(t)(x). Accordingly,

$\begin{matrix}{{{J\left( g_{t} \right)} = {\sum\limits_{n = {1:N}}\left\{ {{{{x_{t}\left( x_{n} \right)} - {\alpha_{t}{h_{t}\left( x_{n} \right)}}}}_{A}^{2} + {\lambda{{{s_{i}\left( x_{n} \right)} - {\alpha\;{h_{t}\left( x_{n} \right)}}}}_{B}^{2}}} \right\}}},} & (9)\end{matrix}$where r_(t)(x)=y(x)−g_(t−1)(x) and s_(t)(x)=μ−g_(t−1)(x).

The optimal weak function ĥ (dropping the subscript t for notationalclarity) and its weight coefficient {circumflex over (α)}, whichmaximally reduce the cost function (or boost the performance) are givenas follows:

$\begin{matrix}{{\hat{h} = {{\arg\;\max\limits_{h \in H}} \in (h)}},{{\overset{\sim}{\alpha}\left( \hat{h} \right)} = \frac{{tr}\left\{ {\left( {{AR} + {\lambda\;{BS}}} \right){\hat{H}}^{T}} \right\}}{{\hat{H}}_{A + {\lambda\; B}}^{2}}},{where}} & (10) \\{{{\in (h)} = \frac{{tr}\left\{ {\left( {{AR} + {\lambda\;{BS}}} \right)H^{T}} \right\}}{\sqrt{{H}_{A + {\lambda\; B}}^{2}}\sqrt{{R}_{A}^{2} + {\lambda{S}_{B}^{2}}}}},} & (11)\end{matrix}$and the matrices R_(q×N), S_(q×N), and H_(q×N) are defined as: R[r(x₁),. . . , r(x_(N))], S=[s(x₁), . . . , s(x_(N))], H=[h(x₁), . . . ,h(x_(N))]. Finally, IBR invokes shrinkage (with the shrinkage factorη=0.5) leading to a smooth output function:g_(t)(x)=g_(t−1)(x)+ηα_(t)h_(t)(x).

As described in S. Zhou et al., “Image-based Regression Using BoostingMethod,” In Proc. ICCV, 2005, which is incorporated herein by reference,over-complete image feature representation based on local rectanglefeatures is used to construct one-dimensional (1D) decision stumps asprimitives of the dictionary set H. This construction enables robustnessto appearance variation and fast computation. Each local rectangle imagefeature has its own attribute μ, namely feature type and window/size.

A 1D decision stump h(x) is associated with a local rectangle featuref(x:μ) a decision threshold ε, and a binary direction indicator p, i.e.,pε{−1,+1}. FIG. 4A illustrates a 1D decision stump. Such a 1D decisionstump can be expressed as:

$\begin{matrix}{{h\left( {x;\mu} \right)} = {+ \left\{ {\begin{matrix}{+ 1} & {{{{if}\mspace{14mu}{{pf}\left( {x;\mu} \right)}} \geq p} \in} \\{- 1} & {otherwise}\end{matrix}.} \right.}} & (12)\end{matrix}$

Given a moderate image size, a large number of image features can begenerated by varying their attributes. The number of features can bedenoted by M. By adjusting the threshold ε, e.g., Kevenly spaced levels,K decision stumps can be created per feature, such that 2 KM 1D decisionstumps are created.

A weak function is constructed as a q-dimensional (q-D) decision stumph(x)_(q×1) that stacks q 1D decision stumps. This can be expressed as:h(x;μ ₁ . . . , μ_(q))=[h ₁(x;μ ₁), . . . , h _(q)(x;μ _(q))]^(T).  (13)

Because each h_(j)(x;μ) is associated with a different parameter, it ispossible to construct a sufficiently large weak function set thatcontains (2KM)^(q) weak functions.

Boosting acts as a feature selector, such that at each round ofboosting, the features that maximally decrease the cost function (9) areselected. However, to transform the boosting algorithm into an efficientimplementation, there is computational bottleneck that is themaximization task in (10). This maximization task necessitates a greedyfeature selection scheme, which can be too expensive to evaluate becauseit involves evaluating (2 KM)^(q) decision stumps for each boostingiteration.

IBR utilizes an incremental feature selection scheme by breaking q-Dregression problem into q dependent 1D regression problems. Using theincremental vector:h ^(i)(x)_(t×1) =[h ₁(x), . . . , h _(i)(x)]^(T) =[h ^(i−1)(x)^(T) ,h_(i)(x)]^(T).  (14)the optimal h_(i)(x) is searched to maximize the ε(h^(i)), which issimilarly defined in (11) but based on all i (i≦q) dimensions processedso far. The incremental selection scheme needs to evaluate only 2qMNKdecision stumps with some overhead computation while maintaining thedependence among the output dimension to some extent.

The IBR described above has two drawbacks. First, it is restrictive touse the subspace regularization term ∥μ−g(x_(n))∥_(B) ² in (7), whichamounts to a multivariate Gaussian assumption about the output variablethat often manifests a non-Gaussian structure for real data. As aresult, the generalization capability is hampered. Second, the weakfunction h(x) can be too “weak” as it consists of several 1D binarydecision stumps h_(j)(x) sharing the same weight coefficient α.Consequently, the training procedure can take a long time, and thetrained regression function uses too many weak functions, which canaffect the running speed. IBRR, which is descried in detail below,overcomes the drawbacks of IBR by replacing the subspace regularizationand enhancing the modeling strength of the weak function.

Instead of using the 1D decision stumps as primitives, the IBRR methodaccording to an embodiment of the present invention uses regressionstumps. FIG. 4B illustrates a regression stump. As illustrated in FIG.4B, a regression stump h(x;μ) is defined as:

$\begin{matrix}{{{h\left( {x;\mu} \right)} = {{\sum\limits_{k = {1:K}}{u_{k}^{\prime}\left\lbrack {{f\left( {x;\mu} \right)} \in R_{k}} \right\rbrack}} = {{e\left( {x;\mu} \right)}^{T}w}}},} & (15)\end{matrix}$where [.] is an indicator function, ƒ(x;μ) is the response function ofthe local rectangle feature with attribute μ, and {R_(k); k=1, 2, . . ., K} are evenly spaced intervals. In (15), all the weights w_(k) arecompactly encoded by a weight vector w_(K×1)=[w₁, w₂, . . . , w_(k)]^(T)and the vector e(x;μ) is some column of the identity matrix: only oneelement is one and all others are zero. Similarly, the weak functionh(x)_(q×1) is constructed by stacking q different 1D regression stumps,i.e.,h(x;μ ₁, . . . , μ_(q))=[e ₁(x;μ ₁)^(T) w ₁ , . . . , e _(q)(x;μ_(q))^(T) w _(q)]^(T).  (16)where w_(j) is the weight vector for the j^(th) regression stumph_(j)(x;μ). The weights belonging to all regression stumps can befurther encoded into a weight matrix W_(K×q)=[w₁, w₂, . . . , w_(q)].Since we now use the weights, we drop the common coefficient α in theregression output function defined in (8), and instead express theregression function as follows:

$\begin{matrix}{{g_{t}(x)} = {{{g_{t - 1}(x)} + {h_{t}(x)}} = {\sum\limits_{i = {1:t}}{{h_{i}(x)}.}}}} & (17)\end{matrix}$

It is easy to verify that a regression stump can be formed by combiningmultiple decision stumps. Such a combination strengthens the modelingpower of weak functions and consequently accelerates the trainingprocess. Empirical evidence shows that the training time is almostinversely proportional to the number of levels used in the weakfunction. Although using the regression stump brings the risk ofoverfitting, this risk can be ameliorated by considering the modelcomplexity of the regression stump.

Ridge regression, also known as Tikhonov regularization, is a method ofregularization for an ill-conditioned system of linear equations.According to an embodiment of the present invention, ridge regressionprinciples are adopted into a boosting framework in order to train aregression function using IBRR.

The model complexity of the regression output function g_(t)(x) dependson its weight matrices {W₁, W₂, . . . , W_(t)}. Because boostingregression proceeds iteratively, at the t^(th) boosting iteration, thefollowing ridge regression task is performed that only involves theweight matrix W_(t) (dropping the subscript t for notational clarity):

$\begin{matrix}{\arg\;{\min\limits_{W}{\left\{ {{J(g)} = {{\sum\limits_{n = {1:N}}\left\{ {{{r\left( x_{n} \right)} - {h\left( x_{n} \right)}}}_{A}^{2} \right\}} + {\lambda{W}_{B}^{2}}}} \right\}.}}} & (18)\end{matrix}$

Because the weight vectors {w₁, w₂, . . . , w_(q)} in the weight matrixW are associated with q different local rectangle features, theoptimization in (18) implies two subtasks:

-   -   1. Given a set of q features with attributes μ₁, . . . , μ_(q),        respectively, find the optimal matrix Ŵ(μ₁, . . . , μ_(q)) and        the minimum cost Ĵ(μ₁, . . . , μ_(q)); and    -   2. Find the optimal set of q features with respective attributes        μ₁, . . . , μ_(q) that minimizes the minimum cost Ĵ(μ₁, . . . ,        μ_(q)). This corresponds to feature selection.

The optimization in (18) necessitates a greedy feature selection thatmay be computationally unmanageable. Accordingly, it may be advantageousto resort to a suboptimal, yet computationally amenable, incrementalfeature selection method. Accordingly, we introduce the following“incremental” vectors and matrices:

${A^{i} = \begin{bmatrix}A^{i - 1} & a^{i - 1} \\a^{{i - 1^{T}}\;} & a_{i}\end{bmatrix}},{h^{i} = \begin{bmatrix}h^{i - 1} \\h_{i}\end{bmatrix}},{r^{i} = {\begin{bmatrix}r^{i - 1} \\r_{i}\end{bmatrix}.}}$

Assuming that features have been selected up to i−1, that is theincremental vector h^(i−1)(x;μ₁, . . . , μ_(i−1)) and the weight vectorsw₁, . . . , w_(i−1) are known, the IBRR method aims to find the weakfunction h^(i−1)(x;μ_(i))=e_(i)(x;μ_(i))^(T)w_(i) that minimizes thefollowing ridge regression cost J^(i)(μ_(i),w_(i)) (referred to hereinas the IBRR cost function):

$\begin{matrix}{{{J^{i}\left( {\mu_{i},w_{i}} \right)} = {\sum\limits_{n = {1:N}}\left\{ {{{{r^{i}\left( x_{n} \right)} - {h^{i}\left( x_{n} \right)}}}_{A^{i}}^{2} + {\lambda{w_{i}}_{B}^{2}}} \right\}}},} & (19)\end{matrix}$It can be derived that, for a fixed μ_(i), the optimal weight vector is:

$\begin{matrix}{{{{\hat{w}}_{i}\left( \mu_{i} \right)} = {{\Gamma_{i}\left( \mu_{i} \right)}^{- 1}\tau_{i}{e_{i}\left( {x;\mu_{i}} \right)}}},} & (20) \\{where} & \; \\{{{\Gamma_{i}\left( \mu_{i} \right)} = {{\lambda\; B} + {\sum\limits_{n = {1:N}}\left\{ {{e_{i}\left( {x_{n};\mu_{i}} \right)}a_{i}{e_{i}\left( {x_{n};\mu_{i}} \right)}^{T}} \right\}}}},} & (21) \\{\tau_{i} = {\sum\limits_{n = {1:N}}{\left\{ {{\left( {{r^{i - 1}\left( x_{n} \right)} - {h^{i - 1}\left( x_{n} \right)}} \right)^{T}a^{i - 1}} + {{r_{i}\left( x_{n} \right)}^{T}a_{i}}} \right\}.}}} & (22)\end{matrix}$Accordingly, the IBRR method searches for the optimal μ_(i) to minimizethe IBRR cost function J^(i)(μ_(i),ŵ_(i)(μ_(i))).

When A=B=I_(q), the incremental feature selection gives the optimalsolution. In this case, the optimal weight w_(j,k) for the j^(th) weakfunction is the weighted average:

$\begin{matrix}{w_{j,k} = {\frac{\sum\limits_{n = 1}^{N}{{r_{j}\left( x_{n} \right)}\left\lbrack {{f\left( {x_{n};\mu_{j}} \right)} \in R_{k}} \right\rbrack}}{\lambda + {\sum\limits_{n = 1}^{N}\left\lbrack {{f\left( {x_{n};\mu_{j}} \right)} \in R_{k}} \right\rbrack}}.}} & (23)\end{matrix}$

The order of the dimension of the output variable can be randomlypermutated in order to improve robustness and remove bias. It is alsopossible to improve efficiency by randomly sampling the dictionary set,i.e., replacing M with a smaller M′, and randomly sampling the trainingdata set, i.e., replacing N with a smaller N′.

FIG. 5A illustrates a method of training a regression function usingIBRR according to an embodiment of the present invention. FIG. 56 ispseudo code for implementing the method of FIG. 5A. At step 502, theIBRR tuning parameters are initialized. Step 502 is shown at 552 of FIG.5B. As shown in FIG. 5B, the normalization matrices A and B, theregularization coefficient λ, and the shrinkage factor η are set. Thesemay be set automatically or manually by a user. Stopping criteria, suchas a maximum number of iterations T_(max) and a minimum cost functionvalue J_(min) are also set. Furthermore, initial values for t=0,g₀(x)=0, and r₀(x)=y(x) are also set.

At step 504, an optimal weak function is determined based on a set ifimage features. The optimal weak function is determined to minimize theIBR cost function (19). Step 504 is shown at step 554 of FIG. 5B, anddescribed in greater detail in FIGS. 6A and 6B.

At step 506, the regression function is updated based on the optimalweak function determined in step 504. Step 506 is shown at 556 of FIG.5B. As shown in (17) the regression function is updated at eachiteration by adding the optimal weak function for that iteration to theprior regression function, such that the final regression function isthe sum of the weak functions for all of the iterations. Accordingly,when the weak function is determined, it is added to the priorregression function.

At step 508, the approximation error and IBRR cost function areevaluated based on the updated regression function. The approximationerror tests the regression function by comparing difference vector basedon input training data resulting from the regression function to theknown output training data. The IBRR cost function is expressed in (19).

At step 510, it is determined whether the IBRR method has converged.Step 510 is shown at 560 of FIG. 5B. In order for the method toconverge, it is determined whether a stop condition is met. For example,convergence can be achieved if the cost function is less than theminimum cost function J_(min) as shown in FIG. 5B. It is also possiblethat convergence is achieved when the maximum number of iterationsT_(max) has occurred, when the approximation error r_(t)(x) is less thana certain threshold, when the difference between the cost function atthe previous step and the current step is less than a certain threshold,or when the difference between the approximation error at the previousstep and the current step is less than a certain threshold. If the IBRRmethod has not converged at step 510, the method returns to step 504 andrepeats steps 504, 506, and 508 until convergence is achieved. If theIBRR method has converged at step 510, the method proceeds to step 512.

At step 512, the regression function is stored or output. The trainedregression function resulting from the method can be stored in a memoryor storage of a computer system or output for use in detecting anatomicstructures in medical images.

FIG. 6A illustrates a feature selection method according to anembodiment of the present invention. This method corresponds to step 504of FIG. 5A. FIG. 6B is pseudo code for implementing the method of FIG.6A. At step 602, the dimension of the weak function is set. As describedabove, the dimension of the weak function can be permutated in order toimprove robustness and remove bias. The dimension of the weak functiondetermines how many image features are used to determine each weakfunction. Step 602 is shown at 652 of FIG. 6B.

At step 604, an image feature is selected to minimize the IBR costfunction (19). Step 604 is shown at 654 of FIG. 6B. The image feature isselected by looping over each of the image features in the featuredictionary set M to find the feature that most minimizes the IBR costfunction (19). As described above, each image feature is a localrectangle feature that is used by a 1D regression stump. As shown at 653of FIG. 6B, it is possible that a reduced dictionary set M′ be sampledfrom the dictionary set M and used in place of M for feature selectionin order to improve computational efficiency. Also, as shown at 653 ofFIG. 6B, a reduced set N′ of training data may be sampled from thetraining set N in order to improve computational efficiency.

At step 606, the weak function is updated. As described above, the weakfunction is constructed by stacking q different 1D regression stumps.Each of the regression stumps uses an image feature. Once an imagefeature is selected at step 604, the weak function augments the currentfeature to previously selected features in an incremental fashion. Step606 is shown at 656 of FIG. 6B.

At step 608, it is determined whether the number of features selectedfor the current weak function is less than the dimension of the weakfunction. If the number of features selected is less than the dimensionq, the method returns to step 604 and repeats steps 604 and 606 until qimage features have been selected for the weak function. Once q imagefeatures have been selected, and the number of selected image featuresis no longer less than the dimension of the weak function, the methodproceeds to step 610.

At step 610, the weak function is output or stored. The weak functionresulting from the combination of the selected features can be outputfor continued use in the IBRR method of FIG. 5A. The weak function maybe stored in memory or storage of a computer system. The weak functionsstored in each iteration of the IBRR method of FIG. 5 can be combined togenerate the regression function.

Once a regression function is learned, using the methods of FIGS. 5A and6A, the regression function can be used to detect objects, such asanatomic structures, in medical images. Although, only one scanned imagepatch is needed to predict a location (parameter) of the target objectusing the learned regression function, it is possible to utilizemultiple scan to increase accuracy and robustness of the estimatedobject location. FIG. 7 illustrates a method for detecting a targetobject in a medical image using a trained regression function accordingto an embodiment of the present invention.

At step 702, multiple image patches are scanned. According to a possibleimplementation, the method can scan a certain number of random imagepatches. Accordingly, M random image patches can be scanned at positions{θ

¹

, θ

²

, . . . , θ

_(M)

}. As described above, the location or position of the image patches canbe expressed using the 2D parameter or the 5D parameter.

At step 704, a difference vector is determined for image patch using thetrained regression function. Accordingly, for each θ

^(m)

, the trained regression function is invoked to determine a differencevector dθ

^(m)

. The difference vector (or difference parameter) dθ

^(m)

represents the difference between the location θ

^(m)

of the image patch and a location θ₀

^(m)

of the target object in the medical image. This step can be expressedas:dθ ^(<m>) =F(I(θ^(<m)>)), m=1, 2, . . . , M  (24)where F is the learned regression function.

At step 706, a target parameter θ₀

^(m)

(i.e., the location of the target object in the medical image) ispredicted corresponding to each image patch based on the differencevector determined for the image patch. Accordingly, multiple targetparameters are predicted corresponding to the multiple image patches.This step can be expressed as:θ₀ ^(<m)>=θ^(<m>) +dθ ^(<m)>, m=1, 2, . . . , M  (25)

At step 708, a mean value of all of the predicted target values θ₀

^(m)) is calculated in order to determine a final estimate of thelocation of the target object. The M predictions {θ₀

^(m)

; m=1, 2, . . . , M} are treated as independent and their mean value iscalculated as the final estimate {circumflex over (θ)}₀ for the groundtruth parameter, as follows:

$\begin{matrix}{{\hat{\theta}}_{0} = {M^{- 1}{\sum\limits_{m = {1:M}}{\theta_{0}^{< m >}.}}}} & (26)\end{matrix}$

One possible disadvantage to the IBRR method of the present invention isthat it lacks a confidence measure, i.e., the regression function tellsno confidence regarding its prediction. In order to provide a confidencescore, it is possible to learn a binary detector D specialized for theanatomy of interest. After finding the m^(th) prediction θ₀

^(m)

in the method of FIG. 7, it is possible to apply the detector D to theimage patch I(ƒ₀

^(m)

). If the detector D fails, then the m^(th) sample can be discarded;otherwise, the confidence score c^(m) resulting from the detector D iskept. Accordingly, we have a weighted set {(ƒ₀

^(j)

,c

^(j)

): j=1, 2, . . . , J} (note that J≦M as samples may be discarded), basedon which the weighted mean can be calculated as the final estimate{circumflex over (θ)}₀, as follows:

$\begin{matrix}{{\hat{\theta}}_{0} = {\left\{ {\sum\limits_{j = {1:J}}{c^{< j >}\theta_{0}^{< j >}}} \right\}/{\left\{ {\sum\limits_{j = {1:J}}c^{< j >}} \right\}.}}} & (27)\end{matrix}$In practice, it is possible to specify a value J_(valid) such thatscanning is stopped when J≧J_(valid) in order to reduce computation. Ifthere is no prediction θ₀

^(m)

passing the detector D, the (26) can be used instead of (27) as thefinal estimate as described in step 708 of FIG. 7.

FIG. 8 illustrates exemplary anatomic structure detection results usinga 2D parameterization according to an embodiment of the presentinvention. As illustrated in FIG. 8, the above described methods areused to detect a location of the left ventricle in echocardiogram images810, 820, 830, 840, 850, and 860. using the 2D parameterization a centerpoint 812, 822, 832, 842, 852, and 862 of the left ventricle isestimated in the images 810, 820, 830, 840, 850, and 860, respectively.In images 810, 830, and 850, the center points 812, 832, and 852 aredetermined based on the mean of 100 predicted target outputs usingequation (26). In images 820, 840, and 860, the center points 822, 842,and 862 are determined based on only predicted outputs that pass abinary detector, using equation (27). The curve surrounding the centerpoints 812, 822, 832, 842, 852, and 862 in each of the images 810, 820,830, 840, 850, and 860 represents a 95% confidence curve.

FIG. 9 illustrates exemplary anatomic structure detection results usinga 5D parameterization according to an embodiment of the presentinvention. As illustrated in FIG. 9, the above described methods areused to detect a location (position, orientation, and scale) of the leftventricle in echocardiogram images 910, 920, 930, 940, 950, and 960. Inthe images 910, 920, 930, 940, 950, and 960, a window 912, 922, 932,942, 952, and 962 representing the location of the left ventricle isdetermined. As shown in the images 920, 930, 940, 950, and 960, threeboxes are shown for each of the detected windows 912, 922, 932, 942,952, and 962. The lightest grey box represents the detection resultusing a binary classifier in addition to the IBRR trained regressionfunction (equation (27)). The medium grey box (mostly overlapping thelightest grey box in each image) represents the detection result usingthe IBRR trained regression function without a binary classifier(equation (26)). The darkest grey box represents the ground truthlocation of the target for each image 920, 930, 940, 950, and 960.

The above-described methods for training a regression function andobject detection using a trained regression function may be implementedon a computer using well-known computer processors, memory units,storage devices, computer software, and other components. A high levelblock diagram of such a computer is illustrated in FIG. 10. Computer1002 contains a processor 1004 which controls the overall operation ofthe computer 1002 by executing computer program instructions whichdefine such operation. The computer program instructions may be storedin a storage device 1012 (e.g., magnetic disk) and loaded into memory1010 when execution of the computer program instructions is desired.Thus, applications for training a training a regression function usingIBRR and predicting an object location using a trained regressionfunction may be defined by the computer program instructions stored inthe memory 1010 and/or storage 1012 and controlled by the processor 1004executing the computer program instructions. Furthermore, training data,testing data, the trained regression function, and data resulting fromobject detection using the trained regression function can be stored inthe storage 1012 and/or the memory 1010. The computer 1002 also includesone or more network interfaces 1006 for communicating with other devicesvia a network. The computer 1002 also includes other input/outputdevices 1008 that enable user interaction with the computer 1002 (e.g.,display, keyboard, mouse, speakers, buttons, etc.) One skilled in theart will recognize that an implementation of an actual computer couldcontain other components as well, and that FIG. 10 is a high levelrepresentation of some of the components of such a computer forillustrative purposes.

The foregoing Detailed Description is to be understood as being in everyrespect illustrative and exemplary, but not restrictive, and the scopeof the invention disclosed herein is not to be determined from theDetailed Description, but rather from the claims as interpretedaccording to the full breadth permitted by the patent laws. It is to beunderstood that the embodiments shown and described herein are onlyillustrative of the principles of the present invention and that variousmodifications may be implemented by those skilled in the art withoutdeparting from the scope and spirit of the invention. Those skilled inthe art could implement various other feature combinations withoutdeparting from the scope and spirit of the invention.

1. A method for detecting a target object in a medical image,comprising: determining a difference vector corresponding to at leastone image patch of a medical image using a trained regression function,said difference vector representing a difference between a location ofsaid at least one image patch and a location of a target object in saidmedical image; and predicting a location of said target object in saidmedical image based on said difference vector and predicting a positionof a center point of said target object.
 2. The method of claim 1,further comprising: receiving said at least one image patch.
 3. Themethod of claim 1, further comprising: scanning a plurality of imagepatches at random locations in said medical image prior to saiddetermining step.
 4. The method of claim 3, wherein said step ofdetermining a difference vector corresponding to at least one imagepatch comprises: determining a plurality of difference vectors, eachcorresponding to one of said plurality of image patches, using saidtrained regression function.
 5. The method of claim 4, wherein said stepof predicting a location of said target object comprises: calculating aplurality of target object location predictions, each based on one ofsaid plurality of difference vectors; and calculating a mean of theplurality of target object location predictions.
 6. The method of claim4, wherein said step of predicting a location of said target objectcomprises: calculating a plurality of target object locationpredictions, each based on one of said plurality of difference vectors;processing each of said plurality of target object location predictionsusing a trained binary classifier; and calculating a weighted mean ofthe plurality of target object location predictions based on results ofsaid binary classifier.
 7. The method of claim 1, wherein said step ofpredicting a location of said target object comprises: predicting aposition, size, and orientation of said target object.
 8. The method ofclaim 1, wherein said trained regression function comprises a pluralityof weak functions, each weak function representing at least one imagefeature.
 9. The method of claim 1, wherein said regression function istrained using image-based boosting ridge regression.
 10. A methodcomprising: receiving training data including a plurality of imagepatches of a training image and a plurality of difference vectors, eachof said plurality of difference vectors corresponding to one of saidplurality of image patches and each of said plurality of differencevectors providing a difference between a location of the correspondingimage patch and a target object in said training image; and training aregression function based on said training data using image-basedboosting ridge regression (IBRR), said regression function forpredicting a location of a target object in a medical image based on atleast one input image patch of the medical image.
 11. The method ofclaim 10, wherein said step of training said regression function basedon said training data using IBRR comprises: (a) determining a weakfunction to minimize an IBRR cost function, said weak function based onat least one image feature; (b) updating said regression function basedon said weak function; (c) evaluating approximation error and said IBRRcost function based on the updated regression function; and (d)repeating steps (a)-(c) until a predetermined stop condition is met. 12.The method of claim 11, wherein step (a) comprises: iterativelyselecting a number of optimal image features based on said IBRR costfunction, the number of image features corresponding to a dimension ofsaid weak function; combining the selected image features to generatesaid weak function.
 13. The method of claim 12, wherein each of saidimage features is used by a regression stump.
 14. The method of claim11, wherein step (d) comprises: repeating steps (a)-(c) until at leastone of said approximation error and said IBRR cost function converges.15. An apparatus for detecting a target object in a medical image,comprising: means for storing a trained regression function forpredicting a location of a target object; means for processing at leastone image patch of a medical image using said trained regressionfunction to determine a difference vector corresponding to said at leastone image patch, said difference vector representing a differencebetween a location of said at least one image patch and a location ofsaid target object in said medical image; and means for predicting alocation of said target object in said medical image based on saiddifference vector and predicting a position of a center point of saidtarget object.
 16. The apparatus of claim 15, further comprising: meansfor receiving a plurality of image patches of said medical image,wherein said means for processing at least one image patch comprisesprocessing said plurality of image patches using said trained regressionfunction to determine a plurality of difference vectors, eachcorresponding to one of said plurality of image patches.
 17. Theapparatus of claim 16, wherein said means for predicting a location ofsaid target object comprises: means for calculating a plurality oftarget object location predictions, each based on one of said pluralityof difference vectors; and means for calculating a mean of the pluralityof target object location predictions.
 18. An apparatus comprising:means for receiving training data including a plurality of image patchesof a training image and a plurality of difference vectors, each of saidplurality of difference vectors corresponding to one of said pluralityof image patches and each of said plurality of difference vectorsproviding a difference between a location of the corresponding imagepatch and a target object in said training image; and means for traininga regression function based on said training data using image-basedboosting ridge regression (IBRR), said regression function forpredicting a location of a target object in a medical image based on atleast one input image patch of the medical image.
 19. The apparatus ofclaim 18, wherein said means for training a regression functioncomprises: means for iteratively determining a plurality of weakfunctions to minimize an image-based boosting ridge regression (IBRR)cost function, each of said weak functions based on at least one imagefeature; means for combining said plurality of weak functions togenerate said regression function.
 20. The apparatus of claim 19,wherein said means for iteratively determining a plurality of weakfunctions comprises: means for iteratively selecting a number of optimalimage features for a weak function based on said IBRR cost function, thenumber of image features corresponding to a dimension of the weakfunction; combining the selected image features to generate the weakfunction.
 21. A computer readable medium encoded with computerexecutable instructions for detecting a target object in a medicalimage, the computer executable instructions defining steps comprising:determining a difference vector corresponding to at least one imagepatch of a medical image using a trained regression function, saiddifference vector representing a difference between a location of saidat least one image patch and a location of a target object in saidmedical image; and predicting a location of said target object in saidmedical image based on said difference vector and predicting a positionof a center point of said target object.
 22. The computer readablemedium of claim 21, further comprising computer executable instructionsdefining the step of: scanning a plurality of image patches at randomlocations in said medical image prior to said determining step, whereinthe computer executable instructions defining the step of determining adifference vector corresponding to at least one image patch comprisecomputer executable instructions defining the step of determining aplurality of difference vectors, each corresponding to one of saidplurality of image patches, using said trained regression function. 23.The computer readable medium of claim 22, wherein the computerexecutable instructions defining the step of predicting a location ofsaid target object comprise computer executable instructions definingthe steps of: calculating a plurality of target object locationpredictions, each based on one of said plurality of difference vectors;and calculating a mean of the plurality of target object locationpredictions.